Identifying Relevant Full-Text Articles for GO Annotation Without MeSH Terms
نویسندگان
چکیده
Gene Ontology (GO) is a controlled vocabulary. Given a gene product, GO enables scientists to clearly and unambiguously describe specific molecular functions of the gene product, specific biological processes in which it is involved, and specific cellular components to which it is localized. In this paper, we present our approach to identifying which papers have experimental evidence warranting annotation with GO codes. The training data set contains 375 relevant full-text articles and 5,462 irrelevant ones, and the test data set contains 420 positive full-text articles and 5,623 negative ones. We regarded this problem as a binary classification problem, and employed Support Vector Machines (SVMs) to distinguish positive articles from negative ones. Title, abstract, figure/table captions, and three standard sections – Results, Discussion, and Conclusion were the targets of feature extraction. Without incorporating MeSH (Medical Subject Headings) terms as part of the features, our system achieved 0.381 in Normalized Utility measure.
منابع مشابه
BC4GO: a full-text corpus for the BioCreative IV GO task
Gene function curation via Gene Ontology (GO) annotation is a common task among Model Organism Database groups. Owing to its manual nature, this task is considered one of the bottlenecks in literature curation. There have been many previous attempts at automatic identification of GO terms and supporting information from full text. However, few systems have delivered an accuracy that is comparab...
متن کاملIntegrating information retrieval with distant supervision for Gene Ontology annotation
UNLABELLED This article describes our participation of the Gene Ontology Curation task (GO task) in BioCreative IV where we participated in both subtasks: A) identification of GO evidence sentences (GOESs) for relevant genes in full-text articles and B) prediction of GO terms for relevant genes in full-text articles. For subtask A, we trained a logistic regression model to detect GOES based on ...
متن کاملGene ontology annotation by density and gravitation models.
Gene Ontology (GO) is developed to provide standard vocabularies of gene products in different databases. The process of annotating GO terms to genes requires curators to read through lengthy articles. Methods for speeding up or automating the annotation process are thus of great importance. We propose a GO annotation approach using full-text biomedical documents for directing more relevant pap...
متن کاملThe Gene Ontology Task at BioCreative IV
Gene Ontology (GO) annotation is a common task among model organism database (MOD) groups. It is a very time-consuming and labor-intensive task, thus often considered as one of the bottlenecks in literature curation. There is a growing need for semior fully-automated GO curation techniques that will help database curators rapidly and accurately identify gene function information in full-length ...
متن کاملOverview of the gene ontology task at BioCreative IV
UNLABELLED Gene ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators ...
متن کامل